AITopics | conservative estimate

Summary: This paper proposes a new algorithm that help stabilize off-policy Q-learning. The idea is to introduce approximate Bellman updates that are based on constraint actions sampled only from the support of the training data distribution. The paper shows the main source of instability is the boostrapping error. The boostrapping process might use actions that do not lie in the training data distribution. This work shows a way to mitigate this issue.

bootstrapping error reduction, constraint, off-policy q-learning, (8 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.40)

Add feedback

Human vs. Muppet: A Conservative Estimate of Human Performance on the GLUE Benchmark

Nangia, Nikita, Bowman, Samuel R.

arXiv.org Artificial IntelligenceJun-1-2019

The GLUE benchmark (Wang et al., 2019b) is a suite of language understanding tasks which has seen dramatic progress in the past year, with average performance moving from 70.0 at launch to 83.9, state of the art at the time of writing (May 24, 2019). Here, we measure human performance on the benchmark, in order to learn whether significant headroom remains for further progress. We provide a conservative estimate of human performance on the benchmark through crowdsourcing: Our annotators are non-experts who must learn each task from a brief set of instructions and 20 examples. In spite of limited training, these annotators robustly outperform the state of the art on six of the nine GLUE tasks and achieve an average score of 87.1. Given the fast pace of progress however, the headroom we observe is quite limited. To reproduce the data-poor setting that our annotators must learn in, we also train the BERT model (Devlin et al., 2019) in limited-data regimes, and conclude that low-resource sentence classification remains a challenge for modern neural network approaches to text understanding.

annotator, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

1905.10425

Country: North America > United States (0.47)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Communications > Social Media > Crowdsourcing (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.34)

Add feedback

Adaptive Design of Experiments for Conservative Estimation of Excursion Sets

Azzimonti, Dario, Ginsbourger, David, Chevalier, Clément, Bect, Julien, Richet, Yann

arXiv.org Machine LearningJan-11-2017

We consider a Gaussian process model trained on few evaluations of an expensive-to-evaluate deterministic function and we study the problem of estimating a fixed excursion set of this function. We review the concept of conservative estimates, recently introduced in this framework, and, in particular, we focus on estimates based on Vorob'ev quantiles. We present a method that sequentially selects new evaluations of the function in order to reduce the uncertainty on such estimates. The sequential strategies are first benchmarked on artificial test cases generated from Gaussian process realizations in two and five dimensions, and then applied to two reliability engineering test cases.

conservative estimate, evaluation, test case, (15 more...)

arXiv.org Machine Learning

1611.07256

Country: